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Abstract: We assume a drift condition towards a small set and bound the 
mean square error of estimators obtained by taking averages along a single 
trajectory of a Markov chain Monte Carlo algorithm. We use these bounds 
to construct fixed-width nonasymptotic confidence intervals. For a possibly 
unbounded function / : A" — > ij, let 7 = J'^ f{x)'rr{x)dx be the value of 

interest and it,n = (-'^Z'^) X/i— " ^ ■^("'^») MCMC estimate. Precisely, 
we derive lower bounds for the length of the trajectory n and burn-in time 
t which ensure that 

Pi\it,n-I\ <e)>l-a. 
The bounds depend only and explicitly on drift parameters, on the norm 
of /, where V is the drift function and on precision and confidence param- 
eters £, a. Next we analyse an MCMC estimator based on the median of 
multiple shorter runs that allows for sharper bounds for the required total 
simulation cost. In particular the methodology can be applied for comput- 
ing Bayesian estimators in practically relevant models. We illustrate our 
bounds numerically in a simple example. 
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1. Introduction 

An essential part of many problems in Bayesian inference is the computation of 
analytically intractable integral 



I = f{x)TT{x)dx, 



X 
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where f{x) is the target function of interest, X is often a region in high-dimensio- 
nal space and the probabihty distribution tt over X is usuaUy known up to a 
normahzing constant and direct simulation from tt is not feasible (see e.g. [8], 
[27]). A common approach to this problem is to simulate an ergodic Markov 
chain (X„)„>o, using a transition kernel P with stationary distribution tt, which 
ensures that X„ — > tt in distribution. Thus, for a "large enough" rip, X„ for 
n > no is approximately distributed as tt. Since a simple and powerful algorithm 
has been introduced in 1953 by Metropolis et al. in a very seminal paper 
various sampling schemes and approximation strategies have been developed 
and analyzed ([27], [8]) and the method is referred to as Markov chain Monte 
Carlo (MCMC). 

The standard method is to use average along a single trajectory of the un- 
derlying Markov chain and discard the initial part to reduce bias. In this case 
the estimate is of the form 

, t+n-\ 
i=t 

and t is called the burn-in time. Asymptotic validity of ([T]) is ensured by a 
law of large numbers that holds in this setting under very mild assumptions 
[22]. Various results justify the choice of H]). In particular, for reversible chains, 
Geyer in [14] shows that subsampling is ineffective (in terms of asymptotic 
variance) and Chan and Yue in ^ consider asymptotic efficiency of ([T]) in a 
class of linear estimators (in terms of mean square error). Asymptotic behaviour 
of It^n is usually examined via a Central Limit Theorem (CLT) for Markov 
chains c.f. [M] [HI [32] . One constructs asymptotic confidence intervals, based 
on the CLT and consistent estimators of the asymptotic variance, as described 
in [m [22l [m [6]. Asymptotic behaviour of the mean square error of /o,n in 
the 1/— uniformly ergodic setting has been also studied by Mathe in [55j using 
arguments from interpolation theory. 

The goal of this paper is to derive explicit lower bounds for n and t in ([T]) 
that ensure the following condition: 

Pi\It,n -I\<e)>l-a, (2) 

where e is the precision of estimation and 1 — a, the confidence level. We insist 
on obtaining bounds which depend only on e, a and computable characteristics 
of the transition kernel P and function /. To decrease the total simulation cost, 
apart from we also consider a nonlinear estimator based on the median of 
multiple shorter runs. 

Results of this or related type have been obtained for finite or compact state 
space X and bounded target function / in [2 [TBI ISZ] ■ Niemiro and Pokarowski 
in [31j give results for relative precision estimation. For uniformly ergodic chains 
and bounded function /, Hoeffding type inequalities are available in [ITl [25l [26] 
and can easily lead to ([2]). 

Tail inequalities for bounded functionals of Markov chains that are not uni- 
formly ergodic were considered in [10], [1] and [11] using regeneration techniques. 
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Computing explicit bounds from these results may be possible with additional 
work, but we do not pursue it here. 

If the target function / is not bounded and the Markov chain is not uni- 
formly ergodic, rigorous nonasymptotic results about finite sample behaviour 
of It^n are scarce. Tail inequalities valid in this setup have been established by 
Bertail and Clemengon in [7] by regenerative approach and using truncation ar- 
guments. However, they involve non-explicit constants and can not be directly 
applied to derive lower bounds on t and n. In [53] a result analogous to ^ is es- 
tablished for a sequential-regenerative estimator (instead of /*,«). The approach 
of |24j requires identification of regeneration times. In many problems of prac- 
tical interest, especially in high dimension, regeneration schemes are difficult to 
implement [T5l [38]. 

Our approach is to assume a version of the well known drift condition to- 
wards a small set f Assumption 12.11 in Section [2]), which is the typical setting 
when dealing with integrals of unbounded functions on noncompact sets. Un- 
der this assumption in Section [3] we bound the mean square error of It^n- Our 
main Theorem 13.11 exploits the result of Baxendale [3] . In Section |4] we study 
confidence estimation ^ and obtain explicit lower bounds on n and t in terms 
of drift parameters defined in Assumption 12. li the T/— norm of /, where V is 
the drift function (for definitions see Section [l.ip and estimation parameters e, 
a. Our bounds are designed to minimise the total simulation cost t + n. The 
estimation scheme is then refined via an elementary exponential inequality for 
a nonlinear estimator, a median of multiple shorter runs. In Section O we give 
an illustrative toy example. 

The emphasis in our paper is on unbounded /, noncompact X and nonuni- 
formly ergodic Markov chains, because this is a setting which usually arises 
when computing Bayesian estimators in many practically relevant models. We 
note that drift conditions required to apply our approach have been established 
in particular for the important hierarchical random effects models in '23] and 
for a more general family of linear models in '20' . 

1.1. Notation and Basic Definitions 

Throughout this paper, tt represents the probability measure of interest, defined 
on some measurable state space {X,J^) and f : X ^ R, the target function. 
Let {Xn)n>o be a time homogeneous Markov chain on {X,J^) with transition 
kernel P. By ttq denote its initial distribution and by tt* its distribution at time 
t. Let I — f{x)Tr{dx) be the value of interest and It.n = ^ J2*itt~^ fi^i) its 
MCMC estimate along one walk. 

For a probability measure fi and a transition kernel Q, by fiQ we denote 
a probability measure defined by ^Q(-) j^Q{x,-)^,{dx). In this conven- 
tion TTj = TToP*. Furthermore if 5' is a real- valued function on X, let Qg{x) := 
Jx 9{y)Qi^i (^y) and ng := g{x)ii{dx). We will also use E^g for ^g. li ^ = 6x 
we will write instead of E^. For transition kernels Qi and Q2, Q1Q2 is also 
a transition kernel defined by QiQ2{x, ■) := Q2iy, ■)Qiix, dy). 



K. Latuszynski et al. /Rigorous MCMC under Drift Condition 4 

Let V : X [\,oo) be a measurable function. For a measurable function 
g : X ~f R define its V-norm as 

wW := sup — — . 

xex V{x) 

To evaluate the distance between two probability measures and fj,2 we use 
the V-norm distance, defined as 

\\^J.l - ^i2\\v ■■= sup Ifiig - fi2g\ ■ 

\g\<v 

Note that ioi V = 1 the T^— norm distance 1 1 • | |y amounts to the well known total 
variation distance, precisely — ^2|iy = 2||/ii — /i2||tv ■— 2sup^gjp — 

M2(A)|. 

Finally for two transition kernels Qi and Q2 the V-norm distance between 
Qi and Q2 is defined by 

\\\Ql - Q2\\\V ■■= \\\Ql{x,-) - Q2{x,-)\\v\,, = sup 



x<£X 



V{x) 



For a probability distribution /i, define a transition kernel fJ.{x,-) :— /^(•), to 
allow for writing |||(5 — mIIIv a-nd \\\fii — fJ-2\\\v- Define also 

Bv :-{/:/: A- ^i?,|/|y<(^}. 

Now if IIIQi — Q2|||y < 00, then Qi — Q2 is a bounded operator from By to 
itself, and \ \\Qi — Q2\\\v is its operator norm. See [3^ for details. 

In the sequel we will work with geometrically ergodic Markov chains. A 
Markov chain is said to be geometrically ergodic if 

\\6xP^ — T^Wtv < M{x)j"' , for TT — a.e. X, and for some 7 < 1. 

In particular, if M{x) < M then the chain is said to be uniformly ergodic. 
Geometric ergodicity is equivalent to existence of a drift function V towards a 
small set (see [32] and c.f. Assumption l2.ip and consequently also to V— uniform 
ergodicity which is defined by the following condition. 

\\S^P" ~tt\\v < MV{x)Y' or equivalently |||P" - 7r|||v < 
for some M < 00 and some 7 < 1. 



2. A Drift Condition and Preliminary Lemmas 



We analyze the MCMC estimation under the following assumption of a drift 
condition towards a small set, c.f. [3]. 

Assumption 2.1. 
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(A.l) Small set. There exist C C A", > and a probability measure v on X, 
such that for all .t G C and A Q X 



(A. 2) Drift. There exist a function V : X ^ [l,oo) and constants A < 1 and 
K < oo satisfying 



(A. 3) Strong Aperiodicity. There exists f3 > such that (3i'{C') > (3. 

In the sequel we refer to /?, V, A, K, [3 as drift parameters. 

This type of drift condition is often assumed and widely discussed in Markov 
chains literature since it implies geometric ergodicity and a CLT for a conve- 
nient class of target functions, see [30] for details and definitions. Computable 
bounds for geometric ergodicity parameters under drift conditions allow to con- 
trol the burn-in time t and the bias of MCMC estimators in practically relevant 
models. Substantial effort has been devoted to estabhshing such bounds, c.f. the 
survey paper by Roberts and Rosenthal |32j and references therein. Particular 
references include e.g. Rosenthal [S^ or Roberts and Tweedie [34] for bounds 
on the total variation distance. Since we deal with unbounded functions, in the 
sequel we make use of the y— uniform ergodicity convergence bounds obtained 
by Baxendale in [3] (c.f. Douc at al. [12] and Fort |13J). In the drift condition 
setting and using explicit convergence bounds, our goal is to control not only 
the burn-in time t, but also the length of simulation n. 

Theorem 2.2 (|30j.p?j). Under Assumption\2^{X)n>o has a unique stationary 
distribution n and nV < oo (130} ). Moreover (Theorem 1.1 of 13J), there exists 
p < 1 depending only and explicitly on /?, /?, A and K such that whenever p < 
7 < 1 there exists M < oo depending only and explicitly on 7, /?, /?, A and K 
such that for all n > 



Formulas for p = p{(3,\,K,(3) and M = M{-f, P, X, K, f3) established in [3] 
are given in Appendix |X| and are used in Section |5| To our knowledge the above- 
mentioned theorem gives the best available explicit constants. However this is a 
topic of ongoing research (c.f. [4]). We note that improving ergodicity constants 
in Theorem l2 . 21 will automatically result in tightening bounds established in our 
paper. 

Corollary 2.3. Under Assumvtion \2.1\ 



P{x,A) > (}iy{A). 




x&C. 




(3) 



IkoF" -'Tt\\v< mminoV, \\7r0 - tt\\v}MY\ 



where M and 7 are such as in Theorem 
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Proof. From Theorem [Owe have ||P"(a;, ■)-'!t{-)\\v < Mj'^Vix), which yields 



7ro0.f7" > / \\P^{x,-)-7ri-)\\vMdx)> sup / \P^{x, ■)9 - 7rg\Md^) 

Jx \g\<vJx 

> sup KoP"5 - = IkoP" - 7r||y. 

\9\<V 

Now let 6\/ — infxex V{x) and let /xi, ^2 be measures. Clearly — 
A*2(a^, Ollv is constant in x and therefore 

III III ~ f^2{x,-)\\v \\fJ'l-f^2\\v 

A^i - V = sup — = . 

X V[x) bv 

Since ||| • |||y is an operator norm and tt is invariant for P, we have 
|koP"-^lk - 6i^||ko^'"-^ll|y = 6y||lko-7r)(P"-7r)|||v 

< 6\/||ko~''r||klll-f"~''''llk = lko~i'lklll-f"~''''llk- 

< |ko-7r||yM7". 

□ 

Next we focus on the following simple but useful observation. 

Lemma 2.4. // for a Markov chain {Xn)n>a on X with transition kernel P 
Assumvtion \2. 1\ holds with parameters (3,V, X, K, f3, it holds also with j3r '■— P, 
Vr := Xr Al/*-, Kr := K^l\ /3r := /? for every r > 1. 

Proof. It is enough to check (A. 2). For a; ^ C by Jensen inequality we have 

XV {x) > V{y)P{x, dy) > (^j ^ V{yY'-P{x, dy)^ 
and hence PVr{x) < X^^'^Vr{x), as claimed. Similarly for x e C we obtain 

PVrix) < K^/\ □ 



Lemma [2 .41 together with Theorem 12.21 yield the following corollary. 
Corollary 2.5. Under Assumvtion \2.1\ we have 

where Mr and 7^ are constants defined as in Theorem \2.S\ resulting from drift 
parameters defined in Lemma\2.4\ 



Integrating the drift condition with respect to tt yields the following bound 
on ttV. 

Lemma 2.6. Under Assumvtion \2. 1\ 

,^^K-X K-X 
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Let fc — f ^ T^f- The next lemma provides a bound on ||/c|''|v in terms of 
'^\^\v without additional effort. 



Lemma 2.7. Under Assumvtion \2.1\ 

by 

where by = ini^ex V{x) and Kp,x = -^^j^^pTF"-- 

Proof. Note that ttV^^^ < TT{C')Kp^\ < Kp^\ by Lemma [2761 and proceed: 

- Jx v{x) ' V b\/^\\m\/^j 

□ 



3. MSE Bounds 

By MSE{It^n) we denote the mean square error of i.e. 

Nonasymptotic bounds on M SE{It^n) are essential to establish confidence esti- 
mation Q and are also of independent interest. The main result of this section 
is the following 

Theorem 3.1 (MSE Bounds). Assume the Drift Condition \2.1\ holds and ~ 
ttq. Then for every measurable function f : X ~^ R, every p > 2 and every 



MSEiIo,n) < 



Wfcl^fJ" A , 2M,7A / ^ ^ Mmin{7roy,||7ro-^Hv} 
n \ 1 - 7r / V - 7) 



(4) 

where fc ~ f ^ T^f md constants M,"f, Mr,"fr depend only and explicitly on 
(3,(3, A and K from Assumvtion \2.1\ as in Theorem \2.2i and Corollary \2.4\ 

We emphasise the most important special case for p — r — 2 as a corollary. 

Corollary 3.2. In the setting of Theorem ] 3. 11 we have in particular 

n \ 1 - 72 / V "(1 - 7) / 
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Remark 3.3. The formulation of the foregoing Theorem 13.11 is motivated by a 
trade-off between small V and small A in Assumption l2.1l It should be intuitively 
clear that establishing the drift condition for a quickly increasing V should result 
in smaller A at the cost of bigger irV. So it may be reasonable to look for a valid 
drift condition with V > C\\fc\^\ for some p > 2 instead of the natural choice 
of p — 2. Lemma [231 should strengthen this intuition. 

Remark 3.4. For evaluating min{7roy, Uttq — 7r|| one will often use the obvious 
bound minjTToV, ||vro — 7r|| v^} < ttoV, because ttqV depends on ttq which is users 
choice, e.g. a deterministic point. Also, in some cases a fairly small bound for 
ttV should be possible to obtain by direct calculations, e.g. if tt is exponentially 
concentrated and is a polynomial of degree 2. However, in absence of a better 
bound for ttV, Lemma l2.6l is at hand. Similarly Lemma |2 . 71 bounds the unknown 
value ||/c|''|y^ in terms of Note that in applications both / and V have 

explicit formulas known to the user and ||/|^|y can be evaluated directly or 
easily bounded. 

Remark 3.5. Let ct^s(/) denote the asymptotic variance from the CLT for 
Markov chains (see e.g. [SHIS]). Since in the drift condition setting 

„ , „ • > 1 as Qo, 

we see that the bounds in Theorem 13.11 and Corollary 13.21 have the correct 
asymptotic dependence on n and are easy to interpret. In particular 7rT^|/^|v 
in Corollary 13.21 should be close to VarT^f for an appropriate choice of V, 
the term 2M272/(1 — 72) corresponds to the autocorrelation of the chain and 
MminjTToV, Uttq — 7r||y}/n(l — 7) is the price for nonstationarity of the initial 
distribution. In fact Theorem 13 . II with ttq = vr yields the following bound on the 
asymptotic variance 

aUf) = lim n£;.[/o,„ - if < ^F||/,nf f 1 + ^ 
n^oo ' y i — 7,. 

Proof of Theorem [3J[ Note that \f\yi/^ = ll/TIv- Without loss of generality 
consider fc instead of / and assume ||/c|''|y — 1- In this setting \fc\v < 1, 
Var^fc = Tifl < nV, MSE{h^n) = E^oUomf, and also for every r G [^,p], 

i/ciyi/. <ii/cr/nvi/. = 1 and i/eiyi-i/. <ii/cr^/nyi^i/. = 1. 

Obviously 

^ n—l 2 ^'' — 2 71—1 

nMSE{io,n) = -Y.^-ofc{X,f + E^ofMfciXj). (6) 

We start with a bound for the first term of the right hand side of ©. Since 
fci^) — ^(2^)' use Corollary 12.31 for Let C = niin{7roy, \\tto — n\\v} and 
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proceed 

1 ""^ 1 ""^ 1 ""^ CM 

- y: E.Mx^r ^oP'fc < -/f +- E ^ ^^^+^7^- (7) 

n n n ■^-^ nil — j) 

1=0 1=0 i=0 ^ 

To bound the second term of the right hand side of ([6]) note that |/c| < V'^^^ 
and use CoroUarv l2.5l 

n—2 71—1 ^ n—2 n—1 

2^0 ^'—2 + 1 2 — J — 2+1 

2 n—2 n—1 



^ 5E E -0(P^(|/c||P-7e|)) 



n 

i=0 j=i+l 

2M, °° 



^ ^ E E ^r-o iM^^ 



l/r 



1=0 j=i+l 
n — 2 

Since < V^/'' and < yi-i^, also \fcV^^''\ < V and we use CoroUary 
for IfclV^^"^. 



Combine ([7|) and ([8|) to obtain 



1 ^ 7r y V "■(! - 7) 

□ 

Theorem 13.11 is expHcitly stated for /o,„, but the structure of the bound is 
flexible enough to cover most typical settings as indicated below. 

Corollary 3.6. In the setting of Theorem \3.1\ 



n \ 1 — 7r 

USE A,.,, < ^M&l(,^?^)(,y^ 11X^1 ,10) 



1 ~ 7r / \ '^-(1 - 7) 

MbE[It^n) < 1 + ttK H — , «/ TTo = dx- (11) 

1 - 7r / V «(1 - 7) 
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Proof. Only ((TT|) needs a proof. Note that Xt ^ S^PK Now use Theorem [22] to 
see that WS^P* ^ ""ll v 1^ M'^*V{x), and apply Theorem 13 . 1 1 with ttq = S^P* ■ □ 

Bound ([9]) corresponds to the situation when a perfect sampler is available 
and used instead of burn-in. For deterministic start without burn-in and with 
burn-in, (fTO]) and (fTTj) should be applied respectively. 

4. Confidence Estimation 

Confidence estimation is an easy corollary of MSE bounds by the Chebyshev 
inequality. 

Theorem 4.1 (Confidence Estimation). Under Assumption \2.1l let 



1 + (12) 

Mmin{7roy, IKo ~ ^|k}||/c|1v/^ ^ 2M^\ ^^^^ 



(14) 



s^a{l — 7) \ 1 — 7r 



_ M^j'Vix)\\mlf^ , 2M.7>- 
b+^b^ + ^t) 

n{t) = 7i , (15) 



2 



£^a(l — 7) \ 1 ~ 7r 



(16) 



Th 



en 



^"(|/o,« - /| < e) > 1 - a, 1/ Xo-^o, n> ^ • (17) 



2 



- /| < £) > 1 - a, ^f { i>max|0,log, (^H±^@li^j|, (18) 

71 > n{t). 

Remark 4.2 (Leading term). The above boimds in (|18p give the minimal length 
of the trajectory (i + n) resulting from (jlip . The leading term of the bound on 
n is 

e-^a 1 — 72 

(where we took p = r = 2 for simplicity). Quantity TTV\f^\v should be of the 
same order as VavTrf, thus a term of this form is inevitable in any bound on n. 
Next, which results from Chebyshev's inequality, is typical and inevitable, 
too. The factor will be reduced later in this section to log(a~^) for small a 
by Lemma im and Algorithm g^l The term 1 + which roughly speaking 

bounds the autocorrelation of the chain, is the bottleneck of the approach. Here 
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good bounds on 7 and the somewhat disregarded in hterature M = M(^) are 
equally important. Improvements in Baxendale-type convergence bounds may 
lead to dramatic improvement of the bounds on the total simulation cost (e.g. 
by applying the preliminary results of [4]). 

Remark 4.3. The formulation of Theorem 14. II indicates how the issue of a suf- 
ficient burn-in should be understood. The common approach is to describe t 
as time to stationarity and to require that t* = t{x, e) should be such that 
p['K,5xP* ) < £ (where /o(-,-) is a distance function for probability measures, 
e.g. total variation distance, or norm distance). This approach seems not 
appropriate for such a natural goal as fixed precision of estimation at fixed con- 
fidence level. The optimal burn-in time can be much smaller then t* and in 
particular cases it can be 0. Also we would like to emphasise that in the typical 
drift condition setting, i.e. if X is not compact and the target function / is not 
bounded, ||7rt — 7r||tv does not even imply ttj/ tt/. Therefore a 1^— norm 
with \f\v < 00 should be used as a measure of convergence. 

Proof of Theorem \4-l\ From the Chebyshev's inequality we get 



P{\It,n -I\<e) = I- P{\it.n -I\>e) 

> 1 - M^^^IhA > 1 ^ a if MSEiit,,)<e^a. (19) 



To prove (fT7|) set C — minl-TroF, \\tto — 7r||y}, and combine (fT9|) with (jH) to get 



The only difference in (fT8|) is that now we have c(t) defined by (fT4|) instead of c. 
It is easy to check that the best bound on t and n (i.e. which minimizes t + n) 
is such that 

n>n{t) and i > max {0, min{i G : n'(<) > -1}} , 

where n{t) is defined by (fT5|) and n'{t) = ■^n{t). Standard calculations show 
that 

min{i e N : n'{t) > -1} = min{t G N : (7*)^^^ In^ 7 - 7*45 - 6^ < 0}, 
where c is defined by (|16p . Hence we obtain 




t > max < 0, (In 7) ^ In 



( 



2+ v/4 + 62 In^ 7 
cln^ 7 



) 



and 



n > n{t). 



This completes the proof. 



□ 
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Next we consider an alternative nonlinear estimation scheme, the so called 
"median trick" (introduced in [TO] in the computational complexity context and 
further developed in [31]) that allows for sharper bounds for the total simulation 
cost needed to obtain confidence estimation for small a. The following simple 
lemma is taken from a more general setting of Section 2 in |31j . 

Lemma 4.4. Let m ^ N he an odd number and let Ii, . . . , /,„ be independent 
random variables, such that P{\Ik I\ <e)> 1 — a> 1/2, for k — 1, . . . ,m. 
Define I := med{/i, . . . , /,„}. Then 

Pili-I\<s)>l-a, ./ m>^^^^^. (20) 

Hence confidence estimation with parametesrs e, a can be obtained by the 
following Algorithm 14.51 

Algorithm 4.5 (MA: median of averages). 

1. Simulate m independent runs of length t + n of the underlying Markov 
chain, 

2. Calculate m estimates of I, each based on a single run. 



n ^-^ 

i=t 

3. For the final estimate take 

i = med{/i, . . . 

Theorem 14.11 should be used to find t and n that guarantee confidence esti- 
mation with parameters e, a and m results from Lemma 14.41 The total cost of 
Algorithm 14.51 amounts to m(t + rt) and depends on a (in addition to previous 
parameters). The optimal a can be found numerically, however a = 0.11969 is 
an acceptable arbitrary choice (cf. [31]). 



5. A Toy Example - Contracting Normals 

To illustrate the results of previous sections we analyze the contracting normals 
example studied by Baxendale in [3] (see also [34], [33] and [36])) where Markov 
chains with transition probabilities P{x, •) = N{9x, 1 — (F) for some parameter 
9 € (—1,1) are considered. 

Similarly as in ^ we take a drift function V{x) = \ + and a small set 

C = [-d,d] with d > 1, which allows for A = 6*2 + ^^ip^ < 1 and if = 
2 + 9^{d'^ — 1). We also use the same minorization condition with v concentrated 
on C, such that i3v{dy) = min:rec(27r(l - e'2))-i/2 cxp(-^i£|^)dy. This yields 
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^ = 2[$(^^±^)-$(-^g=)], where $ denotes the standard normal cumulative 
distribution function. 

Baxendale in [5] indicated that the chain is reversible with respect to its 
invariant distribution tt — N for all 8 e (—1,1) and it is reversible and 
positive for 9 > Q. Moreover, in Lemma [5. II we observe a relationship between 
marginal distributions of the chain with positive and negative values of 6. By 
C{Xn\XQ, 9) denote the distribution of X„ given the starting point Xq and the 
parameter value 0. 

Lemma 5.1. 

c(Xn\x^,e) = c{x,,\{-iyxo,-e). (21) 

Proof. Let Zi, Z2, ... be an iid A^(0, 1) sequence, then 

n 

c{Xr,\Xo,e) = /:(rxo + ^r-Vi-e'^fc) 

fc=i 

n 

= /:((-0)"(-i)"Xo + Vi - e^Zk) 

fc=i 

= /:(x„|(-i)"Xo,-0), 

and we used the fact that and —Zk have the same distribution. □ 
Therefore, ii 9 >Q then from Theorem 12.21 we have 

||£(X„|Xo,0) -A\v< M7"^(^o) = M7"(l + ^2), (22) 

with M and 7 computed for reversible and positive Markov chains (see Appendix 
IA.3l for formulas). For 6* < we get the same bound (|22p with exactly the same 
M, 7 by Lemma l5.ll and the fact that V{x) is symmetric. 

The choice of V{x) ~ l + x^ allows for confidence estimation of f{x)n{dx) 
if \P\v < 00 for the possibly unbounded function /. In particular the MCMC 
works for all linear functions on X. We take f{x) = x where \P\v = 1 as an 
example. We have to provide parameters and constants required for Theorem 
14.11 In this case the optimal starting point is Xq = since it minimizes V(x). 
Although in this example we can compute ttV = 2 and \f'^\v = 1; we also 
consider bounding -kV and \p\v using Lemma and Lemma [1?7] respectively. 



a 


algorithm 


setting 1 
m t n total cost 


setting 2 
m t n total cost 


reality 
m t n total cost 


.1 


one walk 
MA 


1 218 6.46e+09 6.46e+09 


1 229 l.Ole+08 l.Ole+08 


1 811 811 


10"'' 


one walk 
MA 


1 218 6.46e+ll 6.46e+ll 
15 218 5.40e+09 8.10e+10 


1 229 l.Ole+10 l.Ole+10 
15 229 8.45e+07 1.27e+09 


1 3248 3248 
7 726 5082 


10"'' 


one walk 
MA 


1 218 6.46e+13 6.46e+13 
27 218 5.40e+09 1.46e+ll 


1 229 l.Ole+12 l.Ole+12 
27 229 8.45e+07 2.28e+09 


1 5853 5853 
13 726 9438 
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Table 1. Bounds for the one walk algorithm and the median of averages Algorithm 
|^.5| (MA) for 6 = .5, precision parameter e = .1 and different values of the confidence 
parameter a. Baxendale's uniform ergodicity parameters in this example are p = 
.895, p2 = .899. Optimizing the total simulation cost results in 7 = .915, 72 = 
.971, M = 3.64e + 04, M2 = 748. Setting 1 uses Lemmas lOl and Wl\ whereas in 
setting 2, -kV and \ fc\v are computed directly. The bounds are compared to reality 
obtained empirically in a simulation study. 

Examples of bounds for t and n for the one walk estimator, or t, n and m 
for the median of averages (MA) estimator are given in Table 1. The bounds 
are computed for C = [—d,d] with d — 1.6226 which minimizes p2 (rather than 
p) for 9 = 0.5. Then a grid search is performed to find optimal values of 7 and 
72 that minimize the total simulation cost. Note that in Baxendale's result, the 
constant M depends on 7 and goes relatively quickly to cx3 as 7 ^ p. This is 
the reason why optimal 7 and 72 are far from p and p2 and turns out to be 
the bottleneck of Baxendale's bounds in applications (c.f. Remark 14. 2p . Also 
for small a = 10^^, the m ^ 27 shorter runs have a significantly lower bound 
on the required total simulation effort then the single long run. MA is thus 
more mathematically tractable. However, in reality MA is about 7r/2 times less 
efRcient then the one walk estimator - a phenomenon that can be inferred from 
the standard asymptotic theory. 

R functions for computing this example and also the general bounds resulting 



from Theorem l4.1l are available at http://www2.warwick.ac.uk/fac/sci/statistics/staff/research/latuszynski/ 



6. Concluding Remarks 

The main message of our paper is a very positive one: current theoretical knowl- 
edge of Markov chains reached the stage when for many MCMC algorithms of 
practical relevance applied to difficult problems, i.e. estimating expectations of 
unbounded functions, we are able to provide a rigorous, nonasymptotic, a priori 
analysis of the quality of estimation. This is much more then the often used in 
practice visual assessment of convergence by looking at a graph, more sophisti- 
cated a posteriori convergence diagnostics, bounding only burn in time or even 
using asymptotic confidence intervals, and should replace it, where possible. 

The bounds derived in our paper are admittedly conservative, as observed in 
Section [5] We note that this is the case also for explicit bounds on convergence 
in total variation norm established under drift conditions. Nevertheless drift 
conditions remain the main and most universal tool in obtaining nonasymptotic 
results for general state space Markov chains. 

For regenerative algorithms alternative bounds established in [21] are typi- 
cally tighter then those resulting from our Section 3) However, the algorithms 
proposed there are more difficult to implement in practically relevant examples. 



Appendix A: Formulas for p and M 

For the convenience of the reader we repeat here the formulas from 3] that play 
a key role in our considerations. 
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In the sequel the term atomic case and nonatomic case refers to ,9 = 1 and 
P < 1 respectively. U (3 < 1, define 



logA- 



r 1, ift.(C) = l, 

1 + ( log -J ) / (log A~ ^ ) , otherwise. 



Then let 

PR' 



i?o = min{A^\(l-^)-i/"i}, i(J?) = < i-(i-/3)fi°i' i^^^^o, 

I OO if = iio- 



j4.J. Formulas for general operators 

For /3 > 0, i? > 1 and L > 1, let i?i = Ri{P,R,L) be the unique solution 
re (1, -R) of the equation 

r-1 e2/3(i?-l) 



r(log(E/r))2 8(L-1) 

and for 1 < r < define 

2/3 + 2(logiV)(log(i?/r))-i - SNer^r - l)r"' {log{R/r))-^ 
' ^ (r- l)[/3-8iVe-2(r- l)r-i(log(i?/r))-2] 

where iV = (L - l)/(i? - 1). 

For the atomic case we have p = A~^, A~^iir) and for p < 7 < 1, 

„ ^ max(A.K A/,) _^ ^(^:^^,(^-.^^_,-._,-^) 

7- A 7(7 -A) 

(j^-A/7)ma^(A,Jf-A) A(j^-l) 

(7-A)(l-A) +(7-A)(l-A)- ^ > 

For the nonatomic case let i? = argmaxi<fl<i{o Ri{p, R, L{R)). Then we have 
p=l/Ri{l3,R,L{R)) and for p < 7 < 1, 

^ ^ 7-"2-i(j^^_A) ^ / ^ma^(A,Jr-A) (1 - ^)(7-"^ - 1) \ 

(7-A)[l-(l-^)7-«^]2 1-A 7-^-1 J 

max(A,j.-A/7) ^ ^7— ^^^(^^7 - A) 

7^"2A(x_l) i^[i^7- A-/3(7-A)] 



(1 - A)(7 - A)[l - (1 - /3)7-"^] 7^(7 - A)[l - (1 - (3)^-'^^] 
K-X-p{l-X ) 
(l-A)(l-7) 



+ 7 ((7-"= - 1) + (1 - /3)(7""^ - • (24) 
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A. 2. Formulas for self-adjoint operators 

A Markov chain is said to be reversible with respect to tt if Pf{x)g{x)TT{dx) = 
J.^ f{x)Pg{x)TT{dx) for all f,g G i^{7r). For reversible Markov chains the fol- 
lowing tighter bounds are available. 
For the atomic case define 

r mm{X-\,rs}, if K>X + 2f3, 
\ X-\ if K<X + 2f3, 

where is the unique solution of 1 + 2f3r = r^+{iogK){\og\-^) ^ rj,^^^ ^ ^ ^-i 
and for p < 7 < 1 take M as in (gS]) with Ki{^^'^ , P, X^^ , X~''- K) replaced by 
K2 = 1 + 1/(7 -p). 

For the nonatomic case let 

„ ^ / r,, if L(i?o) > l + 2/3i?o, 
^2 \ if L{Ro) < 1 + 2/3i?o, 

where is the unique solution of l + 2/3r = Then p = i?2^^ and for p < j < 
1 take M as in dill) with Ki{-f^'^ , (3, R, L{R)) replaced by K2 = 1 + \f^/{-/- p). 

A. 3. Formulas for self-adjoint positive operators 

A Markov chain is said to be positive if Pf{x)f{x)Tr{dx) > for every 
/ G i^(7r). For reversible and positive markov chains take M's as in Section lA.2l 
with p = X in the atomic case and p = Rq^ in the nonatomic case. 
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